-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] Implement SOZip storage of terra targets #62
base: master
Are you sure you want to change the base?
Conversation
…custom resources format
- TODO: FGB via /vsizip/ seems to return vector data in different order
Apparently Parquet driver is not available in GDAL build being used on GHA, used FlatGeobuf instead which mostly works (appears to come back in different order; GDAL bug?) Also apparently the snapshots have slight differences depending on runner. I don't have any snapshot changes to accept locally, so may need to relax these specific tests (as appears to have been done in other recent cases) EDIT: GDAL >=3.1 is required for FlatGeobuf https://gdal.org/drivers/vector/flatgeobuf.html |
For other formats, a more "generic" write and read function could be used: write_to_zip <- function(object, path) {
#rename path to not be confused with fs::path() just to make more readable
out_path <- path
#do stuff in a fresh local tempdir() that disappears when function is done
tmp <- withr::local_tempdir()
dir_create(tmp, fs::path_dir(out_path))
#write the raster (hard-coded options for demonstration)
writeRaster(object,
fs::path(tmp, out_path),
filetype = "GTiff",
overwrite = TRUE)
#figure out which files got written
raster_files <- dir_ls(path(tmp, path_dir(out_path)))
#package those into a zip file using `zip::zip()`
zip::zip(
path(tmp, fs::path_file(out_path)),
files = fs::path_file(raster_files),
compression_level = 1,
root = fs::path_dir(raster_files)
)
#move the zip file to the out_path as expected output
file_move(path(tmp, fs::path_file(out_path)), out_path)
}
read_from_zip <- function(path) {
tmp <- local_tempdir()
#extract into tempdir
zip::unzip(zipfile = path, exdir = tmp)
#read in as rast
rast(fs::path(tmp, fs::path_file(path)))
} I'm not sure how these compare to SOZip in terms of read and write overhead, but I assume they are worse. |
This is a draft PR to implement an option (
zipfile=TRUE
) to write/read Seek-Optimized ZIP (SOZIP) files in the target store for #37. This works for tar_terra* SpatVector and SpatRaster methods,Rather than attempting to create (non-SOZIP) ZIP files independently using
utils::zip()
or similar this PR uses one of two (simpler, but somewhat more limited) pathways available thru existing GDAL drivers.Specific drivers that support direct write of SOZIP via a file extension for SOZIP (ESRI Shapefile and Geopackage, as of GDAL 3.7) use that path for writing. So, ESRI Shapefile uses .shz extension (as it already does), and GeoPackage uses .gpkg.zip. Read is done via /vsizip/...
All other drivers use
"/vsizip/{path/to/zipfile/target}/target"
generic data source path for write and read. This is not supported by all drivers currently, but works for things like Parquet and GeoTIFF. GeoTIFF requires specific GDAL options (STREAMABLE_OUTPUT=YES, COMPRESS=NONE).Examples of the above cases have been added to tests.
Raster:
Vector:
In the future, {gdalraster} (#48) could be used to assemble SOZip files--this would allow for sidecar files to be included (it appears they are not stored using /vsizip/ to write at this time, need to investigate), and drivers that do not support direct write of SOZip to be supported.